93 research outputs found
From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization
Key Point Analysis (KPA) has been recently proposed for deriving fine-grained
insights from collections of textual comments. KPA extracts the main points in
the data as a list of concise sentences or phrases, termed key points, and
quantifies their prevalence. While key points are more expressive than word
clouds and key phrases, making sense of a long, flat list of key points, which
often express related ideas in varying levels of granularity, may still be
challenging. To address this limitation of KPA, we introduce the task of
organizing a given set of key points into a hierarchy, according to their
specificity. Such hierarchies may be viewed as a novel type of Textual
Entailment Graph. We develop ThinkP, a high quality benchmark dataset of key
point hierarchies for business and product reviews, obtained by consolidating
multiple annotations. We compare different methods for predicting pairwise
relations between key points, and for inferring a hierarchy from these pairwise
predictions. In particular, for the task of computing pairwise key point
relations, we achieve significant gains over existing strong baselines by
applying directional distributional similarity methods to a novel
distributional representation of key points, and further boost performance via
weak supervision.Comment: ACL 202
CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies
Various NLP tasks require a complex hierarchical structure over nodes, where
each node is a cluster of items. Examples include generating entailment graphs,
hierarchical cross-document coreference resolution, annotating event and
subevent relations, etc. To enable efficient annotation of such hierarchical
structures, we release CHAMP, an open source tool allowing to incrementally
construct both clusters and hierarchy simultaneously over any type of texts.
This incremental approach significantly reduces annotation time compared to the
common pairwise annotation approach and also guarantees maintaining
transitivity at the cluster and hierarchy levels. Furthermore, CHAMP includes a
consolidation mode, where an adjudicator can easily compare multiple cluster
hierarchy annotations and resolve disagreements.Comment: EMNLP 202
Probing the local structure: macromolecular combs in external fields
Recent experimental methods allow to monitor the response of macromolecules
to locally applied fields, complementing usual, mesoscopic techniques. Based on
the Rouse-model and its extension to generalized Gaussian structures (GGS), we
follow here the stretching of comb macromolecules under local fields. This
leads to a wealth of informations about the structure: Namely, given the
inhomogeneous architecture of combs, the dynamics and amount of stretching
depend strongly on the position of the monomer on which the external fields
act. We discuss both the theoretical and the experimental implications of our
findings, given that micromanipulations can be supplemented by fluorescence
measurements, which are very sensitive to changes in the intramolecular
distances.Comment: 16 pages, 5 pdf figures, to appear in Chem. Phy
Part-ofSpeech Tagging of Modern Hebrew Text
Words in Semitic texts often consist of a concatenation of word segments, each corresponding to a Part-of-Speech (POS) category. Semitic words may be ambiguous with regard to their segmentation as well as to the POS tags assigned to each segment. When designing POS taggers for Semitic languages, a major architectural decision concerns the choice of the atomic input tokens (terminal symbols). If the tokenization is at the word level the output tags must be complex, and represent both the segmentation of the word and the POS tag assigned to each word segment. If the tokenization is at the segment level, the input itself must encode the different alternative segmentations of the words, while the output consists of standard POS tags. Comparing these two alternatives is not trivial, as the choice between them may have global effects on the grammatical model. Moreover, intermediate levels of tokenization between these two extremes are conceivable, and, as we will aim to show, beneficial. To the best of our knowledge, the problem of tokenization for POS tagging of Semitic languages has not been addressed before in full generality. In this paper, we study this problem for the purpose of POS tagging of Modern Hebre
Choosing an optimal architecture for segmentation and POS-tagging of Modern Hebrew
A major architectural decision in designing a disambiguation model for segmentation and Part-of-Speech (POS) tagging in Semitic languages concerns the choice of the input-output terminal symbols over which the probability distributions are defined. In this paper we develop a segmenter and a tagger for Hebrew based on Hidden Markov Models (HMMs). We start out from a morphological analyzer and a very small morphologically annotated corpus. We show that a model whose terminal symbols are word segments (=morphemes), is advantageous over a word-level model for the task of POS tagging. However, for segmentation alone, the morpheme-level model has no significant advantage over the word-level model. Error analysis shows that both models are not adequate for resolving a common type of segmentation ambiguity in Hebrew â whether or not a word in a written text is prefixed by a definiteness marker. Hence, we propose a morphemelevel model where the definiteness morpheme is treated as a possible feature of morpheme terminals. This model exhibits the best overall performance, both in POS tagging and in segmentation. Despite the small size of the annotated corpus available for Hebrew, the results achieved using our best model are on par with recen
- âŠ